Skip to content

fix: search_business_context falls through to catalog when empty#63

Closed
shawnxiao105-afk wants to merge 1 commit into
datahub-project:mainfrom
shawnxiao105-afk:fix/search-business-context-fallthrough
Closed

fix: search_business_context falls through to catalog when empty#63
shawnxiao105-afk wants to merge 1 commit into
datahub-project:mainfrom
shawnxiao105-afk:fix/search-business-context-fallthrough

Conversation

@shawnxiao105-afk
Copy link
Copy Markdown
Contributor

Summary

When search_business_context returns empty across all four sub-searches (documentation / glossary terms / domains / data products), the skill now fires a dataset search by name and includes the results as catalog_fallback. The agent receives the URN and metadata directly instead of relying on the SKILL.md's soft fall-through guidance, which the LLM was not reliably following.

What changes

  • _search_business_context_impl: factors out _all_results_empty; when true, fires search(query=topic, filter="entity_type = dataset") and merges the results under a catalog_fallback key. A _followup_hint field points the agent at it.
  • search-business-context/SKILL.md: replaces the end-of-doc soft footnote with a labeled section describing the new auto-fallthrough. Includes a "do not call list_tables" note — manual testing showed the LLM otherwise routes to the SQL engine and hallucinates sample content.
  • Unit tests covering populated / errored / partial-shape / all-empty cases for _all_results_empty.

Test plan

  • uv run pytest tests/unit/test_skill_business_context.py — 4 passed
  • uv run ruff check clean on the touched files
  • Manual end-to-end against datahub docker quickstart: asking "tell me about SampleHiveDataset" now produces a correct catalog summary (URN, platform Hive, owners, Legacy tag) instead of the "doesn't exist" answer documented in Agent says dataset "doesn't exist" when there's no docs for it #61. The LLM still occasionally calls SQL engine tools afterward, but the primary answer is grounded in real catalog metadata.

Closes #61

When all four business-context sub-searches return zero hits, the skill
now also fires a dataset search by name and surfaces the results as a
`catalog_fallback` key. This prevents the agent from treating "no
governed documentation" as "entity doesn't exist in the catalog."

The SKILL.md fallback was previously a soft footnote that the LLM did
not reliably follow — verified locally that, with only a hint in the
tool result, the agent would skip to the SQL engine's `list_tables` and
hallucinate sample content. With the fallback wired into the
implementation, the agent receives the URN and metadata directly and
produces an accurate catalog summary.

Closes datahub-project#61
@shawnxiao105-afk
Copy link
Copy Markdown
Contributor Author

Closing — reconsidering the approach; the fix is over-engineered for the scope. Will revisit with a smaller scope.

@shawnxiao105-afk shawnxiao105-afk deleted the fix/search-business-context-fallthrough branch May 26, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent says dataset "doesn't exist" when there's no docs for it

1 participant